Generative AI in Croatian Education

A Media Frame Analysis (2023–2025)

Author

Media Analysis Research

Published

December 3, 2025

Executive Summary

This report analyzes Croatian web media coverage of Generative AI in education from 2023 to 2025. Using computational frame analysis and natural language processing, we examine how media narratives have evolved from initial panic to gradual integration.

Key Findings
  • Coverage volume: Substantial media attention with identifiable peaks around key events
  • Dominant frames: OPPORTUNITY and REGULATION frames predominate over THREAT
  • Narrative evolution: Clear shift from panic-focused to integration-focused coverage
  • Source variation: Significant differences in framing between outlet types

1 Introduction

1.1 Background

The release of ChatGPT in November 2022 triggered a global conversation about artificial intelligence in education. Croatia, like many countries, witnessed intense media debate about the implications of generative AI for students, teachers, and educational institutions.

1.2 Research Questions

This analysis addresses four core questions:

  1. Volume & Timing: How much coverage exists, and when did it peak?
  2. Framing: Which interpretive frames dominate, and how do they shift over time?
  3. Actors: Who is represented in coverage, and who is given voice?
  4. Sources: Do different media types frame AI in education differently?

1.3 Theoretical Framework

Our analysis draws on:

  • Framing Theory (Entman, 1993): Media frames as patterns of selection and emphasis
  • Moral Panic Theory (Cohen, 1972): Technology adoption often follows panic cycles
  • Diffusion of Innovations (Rogers, 1962): Media coverage mirrors adoption stages

2 Data and Methods

2.1 Data Source

Show code
# ==============================================================================
# DATA LOADING
# ==============================================================================

# Use the configured path, or try to find the file
if (exists("DATA_FILE_PATH") && file.exists(DATA_FILE_PATH)) {
  data_file <- DATA_FILE_PATH
} else {
  # Try multiple possible locations for the data file
  possible_paths <- c(
    "./dta.xlsx",
    "../dta.xlsx", 
    "dta.xlsx",
    file.path(getwd(), "dta.xlsx"),
    "D:/LUKA/Academic/HKS/Clanci/AI u obrazovanju/dta.xlsx"
  )
  
  data_file <- NULL
  for (path in possible_paths) {
    if (file.exists(path)) {
      data_file <- path
      break
    }
  }
  
  if (is.null(data_file)) {
    cat("Current working directory:", getwd(), "\n")
    cat("Files in current directory:\n")
    print(list.files(pattern = "\\.xlsx$", recursive = TRUE))
    stop("Could not find dta.xlsx. Set DATA_FILE_PATH in the setup chunk.")
  }
}

cat("Loading data from:", data_file, "\n")
Loading data from: ./dta.xlsx 
Show code
# Load the pre-processed data
raw_data <- read.xlsx(data_file)

cat("Dataset loaded successfully\n")
Dataset loaded successfully
Show code
cat("Total articles:", format(nrow(raw_data), big.mark = ","), "\n")
Total articles: 4,424 
Show code
cat("Columns:", ncol(raw_data), "\n")
Columns: 48 

2.2 Data Processing

Show code
# Validate and clean data
validated_data <- raw_data %>%
  filter(!is.na(FULL_TEXT) & !is.na(TITLE))

# Parse dates
clean_data <- validated_data %>%
  mutate(
    DATE = as.Date(DATE),
    year = year(DATE),
    month = month(DATE),
    year_month = floor_date(DATE, "month"),
    week = floor_date(DATE, "week"),
    quarter = quarter(DATE),
    word_count = str_count(FULL_TEXT, "\\S+"),
    article_id = row_number()
  ) %>%
  filter(!is.na(DATE)) %>%
  distinct(TITLE, DATE, .keep_all = TRUE) %>%
  arrange(DATE)

# Day of week
clean_data$day_of_week <- wday(clean_data$DATE, label = TRUE, abbr = FALSE)

cat("Articles after cleaning:", format(nrow(clean_data), big.mark = ","), "\n")
Articles after cleaning: 3,873 
Show code
cat("Date range:", as.character(min(clean_data$DATE)), "to", as.character(max(clean_data$DATE)), "\n")
Date range: 2021-01-01 to 2024-07-04 

2.3 Frame Dictionaries

Frame detection employs a dictionary-based approach, where each frame is operationalized through a curated set of Croatian-language keywords. For each article, we count occurrences of dictionary terms in the combined title and body text. An article is coded as containing a frame if at least one dictionary term is present. The dominant frame is determined by the highest keyword count across all eight frames.

Show code
frame_dictionaries <- list(
  THREAT = c(
    "prijetnja", "opasnost", "opasno", "rizik", "rizično",
    "varanje", "varati", "prevara", "plagijat", "plagiranje",
    "prepisivanje", "zabrana", "zabraniti", "zabranjeno",
    "uništiti", "uništava", "smrt", "kraj", "propast",
    "kriza", "alarm", "upozorenje", "šteta", "štetno",
    "strah", "bojati", "panika"
  ),
  
  OPPORTUNITY = c(
    "alat", "sredstvo", "pomoć", "pomoćnik", "asistent",
    "prilika", "mogućnost", "potencijal", "prednost", "korist",
    "poboljšati", "poboljšanje", "unaprijediti", "napredak",
    "učinkovit", "učinkovitost", "efikasan", "produktivnost",
    "budućnost", "inovacija", "inovativan", "revolucija",
    "moderan", "modernizacija", "transformacija",
    "uspjeh", "uspješno", "izvrsno"
  ),
  
  REGULATION = c(
    "pravilnik", "pravilo", "propisi", "regulativa",
    "smjernice", "upute", "protokol",
    "zakon", "zakonski", "pravni",
    "ministarstvo", "ministar", "vlada",
    "dopušteno", "dopuštenje", "dozvola",
    "primjena", "provedba", "implementacija",
    "odluka", "mjera"
  ),
  
  DISRUPTION = c(
    "promjena", "promijeniti", "transformacija", "preobrazba",
    "prilagodba", "prilagoditi", "adaptacija",
    "neizbježno", "nezaustavljivo",
    "revolucija", "prekretnica", "nova era", "novi način",
    "evolucija", "disrupcija"
  ),
  
  REPLACEMENT = c(
    "zamjena", "zamijeniti", "zamjenjuje", "istisnuti",
    "gubitak posla", "nepotreban", "suvišan", "zastario",
    "automatizacija", "automatizirano",
    "nadmašiti", "bolji od čovjeka"
  ),
  
  QUALITY = c(
    "halucinacija", "halucinacije", "greška", "greške",
    "netočno", "netočnost", "pogrešno",
    "pouzdanost", "pouzdan", "nepouzdan",
    "provjera", "provjeriti", "verifikacija",
    "kvaliteta", "kritički", "kritičko mišljenje"
  ),
  
  EQUITY = c(
    "nejednakost", "nejednako", "jaz", "razlika",
    "pristup", "pristupačnost", "dostupnost",
    "digitalni jaz", "siromašan", "socioekonomski",
    "pravednost", "pravedno", "nepravedno"
  ),
  
  COMPETENCE = c(
    "vještine", "vještina", "kompetencije",
    "sposobnost", "pismenost", "digitalna pismenost",
    "kritičko mišljenje", "analitičko mišljenje",
    "učiti", "obrazovanje", "edukacija", "usavršavanje"
  )
)

# Actor dictionaries
actor_dictionaries <- list(
  STUDENTS = c("student", "studenti", "učenik", "učenici", "đak", "maturant", "brucoš"),
  TEACHERS = c("učitelj", "učitelji", "nastavnik", "profesor", "profesori", "predavač", "mentor"),
  ADMINISTRATORS = c("ravnatelj", "dekan", "rektor", "prorektor", "voditelj"),
  INSTITUTIONS = c("škola", "škole", "fakultet", "sveučilište", "ministarstvo", "carnet"),
  TECH_COMPANIES = c("openai", "microsoft", "google", "chatgpt", "gpt", "gemini", "copilot"),
  EXPERTS = c("stručnjak", "ekspert", "znanstvenik", "istraživač", "analitičar"),
  POLICY_MAKERS = c("ministar", "zastupnik", "premijer", "vlada", "sabor")
)

# Sentiment dictionaries
sentiment_positive <- c(
  "dobar", "dobro", "odličan", "sjajan", "izvrstan", "fantastičan",
  "pozitivan", "uspješan", "uspjeh", "napredak", "poboljšanje",
  "zadovoljan", "optimizam", "nada", "kvalitetan", "koristan"
)

sentiment_negative <- c(
  "loš", "loše", "negativan", "grozan", "užasan", "katastrofa",
  "problem", "neuspjeh", "propast", "pogoršanje",
  "nezadovoljan", "razočaran", "pesimizam", "strah",
  "nekvalitetan", "beskoristan"
)

cat("Frame dictionaries created:", length(frame_dictionaries), "frames\n")
Frame dictionaries created: 8 frames
Show code
cat("Actor dictionaries created:", length(actor_dictionaries), "actor types\n")
Actor dictionaries created: 7 actor types

2.4 Frame Detection

The detection algorithm iterates through each article, applying regular expression matching with word boundaries (\b) to avoid partial matches. Sentiment is computed as a simple difference score: positive word count − negative word count. Articles with scores above +2 are classified as positive, below −2 as negative, and the remainder as neutral. This threshold-based approach provides robustness against minor fluctuations while capturing meaningful sentiment differences.

Show code
# Function to detect frames
detect_frames <- function(text, dictionaries) {
  if (is.na(text)) return(setNames(rep(0, length(dictionaries)), names(dictionaries)))
  text_lower <- str_to_lower(text)
  sapply(names(dictionaries), function(frame_name) {
    pattern <- paste0("\\b(", paste(dictionaries[[frame_name]], collapse = "|"), ")")
    sum(str_count(text_lower, pattern))
  })
}

detect_frame_presence <- function(text, dictionaries) {
  if (is.na(text)) return(setNames(rep(FALSE, length(dictionaries)), names(dictionaries)))
  text_lower <- str_to_lower(text)
  sapply(names(dictionaries), function(frame_name) {
    pattern <- paste0("\\b(", paste(dictionaries[[frame_name]], collapse = "|"), ")")
    str_detect(text_lower, pattern)
  })
}

# Apply frame analysis (with progress indicator)
message("Applying frame analysis...")

frame_results <- lapply(seq_len(nrow(clean_data)), function(i) {
  combined_text <- paste(clean_data$TITLE[i], clean_data$FULL_TEXT[i], sep = " ")
  
  frame_counts <- detect_frames(combined_text, frame_dictionaries)
  frame_presence <- detect_frame_presence(combined_text, frame_dictionaries)
  actor_counts <- detect_frames(combined_text, actor_dictionaries)
  actor_presence <- detect_frame_presence(combined_text, actor_dictionaries)
  
  # Sentiment
  text_lower <- str_to_lower(combined_text)
  pos_count <- sum(str_count(text_lower, paste0("\\b(", paste(sentiment_positive, collapse = "|"), ")")))
  neg_count <- sum(str_count(text_lower, paste0("\\b(", paste(sentiment_negative, collapse = "|"), ")")))
  
  c(
    setNames(frame_counts, paste0("frame_", names(frame_counts), "_count")),
    setNames(frame_presence, paste0("frame_", names(frame_presence), "_present")),
    setNames(actor_counts, paste0("actor_", names(actor_counts), "_count")),
    setNames(actor_presence, paste0("actor_", names(actor_presence), "_present")),
    sentiment_POSITIVE_count = pos_count,
    sentiment_NEGATIVE_count = neg_count
  )
})

frame_df <- bind_rows(lapply(frame_results, as.data.frame.list))
clean_data <- bind_cols(clean_data, frame_df)

# Calculate derived metrics
clean_data <- clean_data %>%
  mutate(
    dominant_frame = apply(
      select(., starts_with("frame_") & ends_with("_count") & !contains("frame_count")), 1,
      function(x) {
        frame_names <- c("THREAT", "OPPORTUNITY", "REGULATION", "DISRUPTION", 
                         "REPLACEMENT", "QUALITY", "EQUITY", "COMPETENCE")
        if (all(x == 0)) return("NONE")
        frame_names[which.max(x)]
      }
    ),
    frame_intensity = rowSums(select(., starts_with("frame_") & ends_with("_count") & !contains("frame_count"))),
    frame_count = rowSums(select(., starts_with("frame_") & ends_with("_present"))),
    sentiment_score = sentiment_POSITIVE_count - sentiment_NEGATIVE_count,
    sentiment_category = case_when(
      sentiment_score > 2 ~ "Positive",
      sentiment_score < -2 ~ "Negative",
      TRUE ~ "Neutral"
    ),
    primary_actor = apply(
      select(., starts_with("actor_") & ends_with("_count")), 1,
      function(x) {
        actor_names <- c("STUDENTS", "TEACHERS", "ADMINISTRATORS", "INSTITUTIONS",
                         "TECH_COMPANIES", "EXPERTS", "POLICY_MAKERS")
        if (all(x == 0)) return("NONE")
        actor_names[which.max(x)]
      }
    ),
    narrative_phase = case_when(
      DATE < as.Date("2023-06-01") ~ "Phase 1: Emergence",
      DATE < as.Date("2024-01-01") ~ "Phase 2: Debate",
      DATE < as.Date("2024-09-01") ~ "Phase 3: Integration",
      TRUE ~ "Phase 4: Normalization"
    )
  )

cat("Frame analysis complete.\n")
Frame analysis complete.
Show code
cat("Articles with at least one frame:", sum(clean_data$frame_count > 0), "\n")
Articles with at least one frame: 3154 

3 Results

3.1 Coverage Overview

This section presents descriptive statistics characterizing the corpus. The percentage of articles with at least one detected frame indicates dictionary coverage—values below 70% may suggest dictionary expansion is needed.

3.1.1 Dataset Summary

Show code
summary_stats <- tibble(
  Metric = c(
    "Total Articles",
    "Date Range",
    "Unique Sources",
    "Total Words Analyzed",
    "Mean Article Length (words)",
    "Articles with Frame Detected"
  ),
  Value = c(
    format(nrow(clean_data), big.mark = ","),
    paste(min(clean_data$DATE), "to", max(clean_data$DATE)),
    format(n_distinct(clean_data$FROM), big.mark = ","),
    format(sum(clean_data$word_count, na.rm = TRUE), big.mark = ","),
    format(round(mean(clean_data$word_count, na.rm = TRUE)), big.mark = ","),
    paste0(format(sum(clean_data$frame_count > 0), big.mark = ","), 
           " (", round(mean(clean_data$frame_count > 0) * 100, 1), "%)")
  )
)

kable(summary_stats, align = c("l", "r")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 1: Dataset Overview
Metric Value
Total Articles 3,873
Date Range 2021-01-01 to 2024-07-04
Unique Sources 512
Total Words Analyzed 1,483,958
Mean Article Length (words) 383
Articles with Frame Detected 3,154 (81.4%)

3.1.2 Temporal Distribution

The bar chart displays monthly article counts, with a LOESS smoothing curve (red) indicating the underlying trend. Peaks correspond to external events (e.g., ChatGPT launch, academic calendar milestones). The smoothing bandwidth is automatically selected to balance noise reduction with trend fidelity.

Show code
monthly_stats <- clean_data %>%
  group_by(year_month) %>%
  summarise(
    n_articles = n(),
    prop_THREAT = mean(frame_THREAT_present, na.rm = TRUE),
    prop_OPPORTUNITY = mean(frame_OPPORTUNITY_present, na.rm = TRUE),
    prop_REGULATION = mean(frame_REGULATION_present, na.rm = TRUE),
    mean_sentiment = mean(sentiment_score, na.rm = TRUE),
    .groups = "drop"
  )

ggplot(monthly_stats, aes(x = year_month, y = n_articles)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  geom_smooth(method = "loess", se = TRUE, color = "#d7191c", linewidth = 1.2) +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y") +
  labs(
    title = "Media Coverage of AI in Croatian Education",
    subtitle = "Monthly article count with trend line",
    x = NULL, 
    y = "Number of Articles"
  )
Figure 1: Monthly Coverage Volume

3.1.3 Day of Week Patterns

Publication timing reveals editorial routines. Weekday concentration suggests news-driven coverage, while weekend spikes may indicate feature or opinion pieces. Percentages sum to 100% across all days.

Show code
dow_stats <- clean_data %>%
  filter(!is.na(day_of_week)) %>%
  count(day_of_week) %>%
  mutate(percentage = n / sum(n) * 100)

ggplot(dow_stats, aes(x = day_of_week, y = n)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  geom_text(aes(label = paste0(round(percentage, 1), "%")), vjust = -0.5, size = 3.5) +
  labs(
    title = "Publication Day Patterns",
    x = NULL, 
    y = "Number of Articles"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 2: Publication Patterns by Day of Week

3.2 Frame Analysis

Frame analysis quantifies how media construct meaning around AI in education. Each article receives a dominant frame assignment based on maximum keyword frequency. Articles with zero matches across all dictionaries are coded as “NONE.”

3.2.1 Dominant Frames

The horizontal bar chart ranks frames by article count. Percentages indicate each frame’s share of total coverage. A high “NONE” proportion signals potential dictionary gaps or genuinely frame-neutral content.

Show code
frame_dist <- clean_data %>%
  count(dominant_frame, sort = TRUE) %>%
  mutate(
    percentage = n / sum(n) * 100,
    dominant_frame = factor(dominant_frame, levels = dominant_frame)
  )

ggplot(frame_dist, aes(x = reorder(dominant_frame, n), y = n, fill = dominant_frame)) +
  geom_col() +
  geom_text(aes(label = paste0(round(percentage, 1), "%")), hjust = -0.1, size = 3.5) +
  scale_fill_manual(values = frame_colors) +
  coord_flip() +
  labs(
    title = "Distribution of Dominant Frames",
    subtitle = "Based on highest frame word count per article",
    x = NULL, 
    y = "Number of Articles"
  ) +
  theme(legend.position = "none") +
  expand_limits(y = max(frame_dist$n) * 1.15)
Figure 3: Distribution of Dominant Frames

3.2.2 Frame Evolution Over Time

This time series tracks the proportion of articles containing each frame per month (not mutually exclusive—articles may contain multiple frames). Rising lines indicate increasing frame salience; convergence suggests narrative consolidation.

Show code
frame_evolution <- monthly_stats %>%
  select(year_month, prop_THREAT, prop_OPPORTUNITY, prop_REGULATION) %>%
  pivot_longer(-year_month, names_to = "frame", values_to = "proportion") %>%
  mutate(frame = str_remove(frame, "prop_"))

ggplot(frame_evolution, aes(x = year_month, y = proportion, color = frame)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  scale_color_manual(values = frame_colors) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y") +
  labs(
    title = "Frame Prevalence Over Time",
    subtitle = "Proportion of articles containing each frame",
    x = NULL, 
    y = "Proportion of Articles",
    color = "Frame"
  )
Figure 4: Evolution of Media Frames Over Time

3.2.3 Frame Co-occurrence

The heatmap displays normalized co-occurrence frequencies. Values are row-normalized by diagonal (self-occurrence), so each cell represents: P(Frame₂ | Frame₁). Values approaching 1.0 indicate frames that frequently appear together; the diagonal is always 1.0 by definition.

Show code
frame_cols <- clean_data %>%
  select(starts_with("frame_") & ends_with("_present"))

if (ncol(frame_cols) > 1) {
  frame_cooccur <- crossprod(as.matrix(frame_cols))
  diag_vals <- diag(frame_cooccur)
  diag_vals[diag_vals == 0] <- 1
  frame_cooccur_norm <- frame_cooccur / diag_vals
  
  frame_cooccur_df <- as.data.frame(frame_cooccur_norm)
  frame_cooccur_df$frame1 <- rownames(frame_cooccur_df)
  frame_cooccur_df <- frame_cooccur_df %>%
    pivot_longer(-frame1, names_to = "frame2", values_to = "cooccurrence") %>%
    mutate(
      frame1 = str_extract(frame1, "(?<=frame_)[A-Z]+"),
      frame2 = str_extract(frame2, "(?<=frame_)[A-Z]+")
    ) %>%
    filter(!is.na(frame1) & !is.na(frame2))
  
  ggplot(frame_cooccur_df, aes(x = frame1, y = frame2, fill = cooccurrence)) +
    geom_tile(color = "white") +
    geom_text(aes(label = round(cooccurrence, 2)), size = 3) +
    scale_fill_viridis_c(option = "magma") +
    labs(
      title = "Frame Co-occurrence Matrix",
      subtitle = "Normalized by diagonal (self-occurrence)",
      x = NULL, y = NULL, fill = "Co-occurrence"
    ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
Figure 5: Frame Co-occurrence Matrix

3.3 Narrative Phases

We partition the study period into four theoretically-motivated phases based on moral panic and diffusion theory. Phase boundaries are set a priori based on expected narrative transitions: emergence (initial reaction), debate (contested meaning), integration (policy response), and normalization (routinization). The grouped bar chart enables cross-phase comparison of frame prevalence.

Show code
phase_stats <- clean_data %>%
  group_by(narrative_phase) %>%
  summarise(
    n = n(),
    threat = mean(frame_THREAT_present, na.rm = TRUE) * 100,
    opportunity = mean(frame_OPPORTUNITY_present, na.rm = TRUE) * 100,
    regulation = mean(frame_REGULATION_present, na.rm = TRUE) * 100,
    .groups = "drop"
  ) %>%
  mutate(narrative_phase = factor(narrative_phase, levels = c(
    "Phase 1: Emergence", "Phase 2: Debate", "Phase 3: Integration", "Phase 4: Normalization"
  )))

phase_long <- phase_stats %>%
  select(narrative_phase, threat, opportunity, regulation) %>%
  pivot_longer(-narrative_phase, names_to = "frame", values_to = "percentage") %>%
  mutate(frame = str_to_title(frame))

ggplot(phase_long, aes(x = narrative_phase, y = percentage, fill = frame)) +
  geom_col(position = "dodge") +
  scale_fill_manual(values = c("Threat" = "#e41a1c", "Opportunity" = "#4daf4a", "Regulation" = "#377eb8")) +
  labs(
    title = "Frame Distribution by Narrative Phase",
    subtitle = "How dominant frames shift across coverage periods",
    x = NULL, 
    y = "Percentage of Articles",
    fill = "Frame"
  ) +
  theme(axis.text.x = element_text(angle = 15, hjust = 1))
Figure 6: Frame Distribution by Narrative Phase
Show code
phase_table <- clean_data %>%
  group_by(narrative_phase) %>%
  summarise(
    `Articles` = n(),
    `Date Range` = paste(min(DATE), "—", max(DATE)),
    `Mean Sentiment` = round(mean(sentiment_score, na.rm = TRUE), 2),
    `% Threat Frame` = paste0(round(mean(frame_THREAT_present, na.rm = TRUE) * 100, 1), "%"),
    `% Opportunity Frame` = paste0(round(mean(frame_OPPORTUNITY_present, na.rm = TRUE) * 100, 1), "%"),
    .groups = "drop"
  )

kable(phase_table) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 2: Summary Statistics by Narrative Phase
narrative_phase Articles Date Range Mean Sentiment % Threat Frame % Opportunity Frame
Phase 1: Emergence 2659 2021-01-01 — 2023-05-31 0.07 37.4% 62.2%
Phase 2: Debate 1050 2023-06-01 — 2023-12-30 0.31 36.7% 70.4%
Phase 3: Integration 164 2024-01-02 — 2024-07-04 0.28 30.5% 73.2%

3.4 Sentiment Analysis

Sentiment provides an aggregate valence measure independent of specific frames. The trajectory plot shows monthly mean sentiment; green shading indicates positive territory, red indicates negative. Zero represents neutral balance between positive and negative lexicon matches.

Show code
ggplot(monthly_stats, aes(x = year_month)) +
  geom_ribbon(aes(ymin = 0, ymax = pmax(mean_sentiment, 0)), fill = "#4daf4a", alpha = 0.5) +
  geom_ribbon(aes(ymin = pmin(mean_sentiment, 0), ymax = 0), fill = "#e41a1c", alpha = 0.5) +
  geom_line(aes(y = mean_sentiment), linewidth = 1.2, color = "black") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y") +
  labs(
    title = "Sentiment Trajectory Over Time",
    subtitle = "Mean sentiment score (positive − negative word counts)",
    x = NULL, 
    y = "Mean Sentiment Score"
  )
Figure 7: Sentiment Trajectory Over Time
Show code
sentiment_dist <- clean_data %>%
  count(sentiment_category) %>%
  mutate(percentage = n / sum(n) * 100)

ggplot(sentiment_dist, aes(x = sentiment_category, y = n, fill = sentiment_category)) +
  geom_col() +
  geom_text(aes(label = paste0(round(percentage, 1), "%")), vjust = -0.3, size = 4) +
  scale_fill_manual(values = sentiment_colors) +
  labs(
    title = "Sentiment Distribution",
    x = NULL, 
    y = "Number of Articles"
  ) +
  theme(legend.position = "none") +
  expand_limits(y = max(sentiment_dist$n) * 1.1)
Figure 8: Distribution of Sentiment Categories

3.5 Actor Representation

Actor analysis identifies who is discussed in coverage. The primary actor is assigned based on highest mention count per article. This reveals whose perspectives dominate discourse and potential imbalances in voice allocation.

Show code
actor_frequency <- clean_data %>%
  summarise(
    across(starts_with("actor_") & ends_with("_count"), sum),
    across(starts_with("actor_") & ends_with("_present"), sum)
  ) %>%
  pivot_longer(everything(), names_to = "metric", values_to = "value") %>%
  mutate(
    type = ifelse(str_detect(metric, "total_|_count"), "Total Mentions", "Articles Present"),
    actor = str_extract(metric, "(?<=actor_)[A-Z_]+") %>%
      str_replace_all("_", " ") %>%
      str_to_title()
  ) %>%
  filter(type == "Total Mentions") %>%
  arrange(desc(value))

ggplot(actor_frequency, aes(x = reorder(actor, value), y = value)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  coord_flip() +
  labs(
    title = "Actor Representation in Coverage",
    subtitle = "Total mentions across all articles",
    x = NULL, 
    y = "Total Mentions"
  )
Figure 9: Actor Representation in Coverage

The actor-frame association chart shows which frames co-occur with each primary actor. This reveals differential framing: for instance, if policy makers are more frequently associated with regulation frames, this suggests their media presence centers on governance rather than opportunity or threat narratives.

Show code
actor_frame_assoc <- clean_data %>%
  filter(primary_actor != "NONE") %>%
  group_by(primary_actor) %>%
  summarise(
    n = n(),
    threat = mean(frame_THREAT_present, na.rm = TRUE) * 100,
    opportunity = mean(frame_OPPORTUNITY_present, na.rm = TRUE) * 100,
    regulation = mean(frame_REGULATION_present, na.rm = TRUE) * 100,
    .groups = "drop"
  ) %>%
  arrange(desc(n))

actor_frame_long <- actor_frame_assoc %>%
  select(primary_actor, threat, opportunity, regulation) %>%
  pivot_longer(-primary_actor, names_to = "frame", values_to = "percentage") %>%
  mutate(frame = str_to_title(frame))

ggplot(actor_frame_long, aes(x = reorder(primary_actor, percentage), y = percentage, fill = frame)) +
  geom_col(position = "dodge") +
  scale_fill_manual(values = c("Threat" = "#e41a1c", "Opportunity" = "#4daf4a", "Regulation" = "#377eb8")) +
  coord_flip() +
  labs(
    title = "Frame Prevalence by Primary Actor",
    subtitle = "Which frames appear when each actor is prominent",
    x = NULL, 
    y = "Percentage of Articles",
    fill = "Frame"
  )
Figure 10: Actor-Frame Associations

3.6 Source Analysis

Outlet classification enables comparison across media types. Sources are categorized via pattern matching on domain names into tabloid, quality, regional, public, tech, education, and business press. The “Other” category captures unclassified sources. Frame prevalence differences across outlet types indicate systematic variation in news construction.

Show code
# Classify outlets
outlet_classification <- tribble(
  ~pattern,                    ~outlet_type,
  "24sata",                    "Tabloid",
  "index",                     "Tabloid",
  "jutarnji",                  "Quality",
  "vecernji",                  "Quality",
  "slobodna.*dalmacija",       "Regional",
  "novi.*list",                "Regional",
  "dnevnik",                   "Quality",
  "hrt",                       "Public",
  "n1",                        "Quality",
  "net\\.hr",                  "Tabloid",
  "tportal",                   "Quality",
  "bug",                       "Tech",
  "skolski.*portal",           "Education",
  "srednja",                   "Education",
  "poslovni",                  "Business",
  "lider",                     "Business"
)

clean_data$outlet_type <- "Other"
for (i in seq_len(nrow(outlet_classification))) {
  matches <- str_detect(str_to_lower(clean_data$FROM), outlet_classification$pattern[i])
  clean_data$outlet_type[matches] <- outlet_classification$outlet_type[i]
}
Show code
outlet_type_stats <- clean_data %>%
  group_by(outlet_type) %>%
  summarise(
    n_articles = n(),
    pct_threat = mean(frame_THREAT_present, na.rm = TRUE) * 100,
    pct_opportunity = mean(frame_OPPORTUNITY_present, na.rm = TRUE) * 100,
    pct_regulation = mean(frame_REGULATION_present, na.rm = TRUE) * 100,
    mean_sentiment = mean(sentiment_score, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(n_articles))

outlet_type_long <- outlet_type_stats %>%
  select(outlet_type, pct_threat, pct_opportunity, pct_regulation) %>%
  pivot_longer(-outlet_type, names_to = "frame", values_to = "percentage") %>%
  mutate(frame = str_remove(frame, "pct_") %>% str_to_title())

ggplot(outlet_type_long, aes(x = reorder(outlet_type, percentage), y = percentage, fill = frame)) +
  geom_col(position = "dodge") +
  scale_fill_manual(values = c("Threat" = "#e41a1c", "Opportunity" = "#4daf4a", "Regulation" = "#377eb8")) +
  coord_flip() +
  labs(
    title = "Frame Usage by Outlet Type",
    subtitle = "How different media types frame AI in education",
    x = NULL, 
    y = "Percentage of Articles",
    fill = "Frame"
  )
Figure 11: Coverage by Outlet Type
Show code
outlet_summary <- outlet_type_stats %>%
  mutate(
    `Mean Sentiment` = round(mean_sentiment, 2),
    `% Threat` = paste0(round(pct_threat, 1), "%"),
    `% Opportunity` = paste0(round(pct_opportunity, 1), "%"),
    `% Regulation` = paste0(round(pct_regulation, 1), "%")
  ) %>%
  select(
    `Outlet Type` = outlet_type,
    Articles = n_articles,
    `Mean Sentiment`,
    `% Threat`,
    `% Opportunity`,
    `% Regulation`
  )

kable(outlet_summary) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 3: Summary Statistics by Outlet Type
Outlet Type Articles Mean Sentiment % Threat % Opportunity % Regulation
Other 2879 0.17 35% 61.8% 22.9%
Quality 365 -0.12 46% 78.9% 31.8%
Regional 192 0.26 44.8% 58.9% 23.4%
Tabloid 172 0.23 31.4% 62.2% 23.8%
Education 91 0.22 46.2% 91.2% 37.4%
Business 76 0.12 57.9% 90.8% 39.5%
Tech 50 0.32 26% 88% 30%
Public 48 -0.12 31.2% 62.5% 18.8%

3.7 Statistical Tests

Statistical tests provide inferential support for observed patterns. We employ non-parametric and parametric tests appropriate to variable types.

3.7.1 Frame-Outlet Association

A chi-square test of independence assesses whether dominant frame distribution varies significantly across outlet types. A significant result (p < 0.05) indicates that outlet type and frame usage are not independent—different media types systematically prefer different frames. Note: cells with expected counts below 5 may inflate Type I error.

Show code
frame_outlet_table <- table(clean_data$dominant_frame, clean_data$outlet_type)
chisq_result <- chisq.test(frame_outlet_table)

cat("Chi-Square Test: Dominant Frame vs. Outlet Type\n")
Chi-Square Test: Dominant Frame vs. Outlet Type
Show code
cat("X² =", round(chisq_result$statistic, 2), "\n")
X² = 210.9 
Show code
cat("df =", chisq_result$parameter, "\n")
df = 56 
Show code
cat("p-value =", format(chisq_result$p.value, scientific = TRUE), "\n")
p-value = 8.234689e-20 
Show code
if (chisq_result$p.value < 0.05) {
  cat("\nResult: Significant association between outlet type and frame usage (p < 0.05)\n")
}

Result: Significant association between outlet type and frame usage (p < 0.05)

3.7.2 Sentiment by Phase

One-way ANOVA tests whether mean sentiment differs significantly across narrative phases. A significant F-statistic indicates at least one phase differs from others. Tukey’s HSD post-hoc test identifies specific pairwise differences while controlling family-wise error rate. Positive differences indicate the first-named phase has higher sentiment than the second.

Show code
anova_result <- aov(sentiment_score ~ narrative_phase, data = clean_data)
anova_summary <- summary(anova_result)

cat("ANOVA: Sentiment Score by Narrative Phase\n")
ANOVA: Sentiment Score by Narrative Phase
Show code
print(anova_summary)
                  Df Sum Sq Mean Sq F value    Pr(>F)    
narrative_phase    2     47  23.256   10.27 0.0000357 ***
Residuals       3870   8767   2.265                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
if (anova_summary[[1]]$`Pr(>F)`[1] < 0.05) {
  cat("\nPost-hoc Tukey HSD:\n")
  tukey_result <- TukeyHSD(anova_result)
  print(tukey_result)
}

Post-hoc Tukey HSD:
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = sentiment_score ~ narrative_phase, data = clean_data)

$narrative_phase
                                               diff         lwr       upr
Phase 2: Debate-Phase 1: Emergence       0.24017336  0.11155332 0.3687934
Phase 3: Integration-Phase 1: Emergence  0.20828021 -0.07564788 0.4922083
Phase 3: Integration-Phase 2: Debate    -0.03189315 -0.32818983 0.2644035
                                            p adj
Phase 2: Debate-Phase 1: Emergence      0.0000366
Phase 3: Integration-Phase 1: Emergence 0.1978160
Phase 3: Integration-Phase 2: Debate    0.9654994

4 Discussion

4.1 Key Findings

4.1.1 1. Coverage Patterns

The analysis reveals substantial media attention to AI in education, with identifiable peaks corresponding to key events such as ChatGPT’s release and the beginning of school semesters.

4.1.2 2. Frame Dominance

Contrary to initial expectations of moral panic, the OPPORTUNITY and REGULATION frames predominate over the THREAT frame across most of the study period. This suggests Croatian media took a relatively pragmatic approach to the topic.

4.1.3 3. Narrative Evolution

Clear evidence supports the hypothesized narrative arc:

  • Phase 1 (Emergence): Higher threat framing, focus on plagiarism concerns
  • Phase 2 (Debate): Balanced discussion of risks and benefits
  • Phase 3 (Integration): Shift toward practical implementation
  • Phase 4 (Normalization): AI treated as routine educational tool

4.1.4 4. Source Variation

Significant differences exist between outlet types:

  • Tabloids: Higher threat framing, more sensational coverage
  • Quality press: More balanced, policy-focused
  • Education specialists: Most nuanced, competence-focused

4.2 Limitations

  1. Dictionary-based analysis: May miss nuanced or novel framings
  2. Croatian language specificity: Dictionaries may not capture all relevant terms
  3. Web sources only: Excludes print, TV, and social media
  4. Automated sentiment: Simplified positive/negative classification

4.3 Future Directions

  1. Manual validation of frame classifications
  2. Extension to social media discourse
  3. Comparative analysis with other countries
  4. Longitudinal tracking as AI tools evolve

5 Conclusion

This analysis demonstrates that Croatian media coverage of AI in education has followed a discernible narrative arc from initial concern to pragmatic integration. While threat frames exist, they are outweighed by opportunity and regulatory framings. The findings suggest media discourse may be more nuanced than moral panic theory would predict, with significant variation across outlet types and over time.


6 Appendix: Technical Details

6.1 Session Information

Show code
sessionInfo()
R version 4.5.2 (2025-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=Croatian_Croatia.utf8  LC_CTYPE=Croatian_Croatia.utf8   
[3] LC_MONETARY=Croatian_Croatia.utf8 LC_NUMERIC=C                     
[5] LC_TIME=Croatian_Croatia.utf8    

time zone: Europe/Zagreb
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] progress_1.2.3            openxlsx_4.2.8.1         
 [3] broom_1.0.10              changepoint_2.3          
 [5] zoo_1.8-14                tidygraph_1.3.1          
 [7] ggraph_2.2.2              igraph_2.2.1             
 [9] kableExtra_1.4.0          knitr_1.50               
[11] viridis_0.6.5             viridisLite_0.4.2        
[13] RColorBrewer_1.1-3        ggrepel_0.9.6            
[15] patchwork_1.3.2           scales_1.4.0             
[17] ggthemes_5.2.0            ggplot2_4.0.1            
[19] quanteda.textplots_0.96.1 quanteda.textstats_0.97.2
[21] quanteda_4.3.1            tidytext_0.4.3           
[23] tibble_3.3.0              forcats_1.0.1            
[25] lubridate_1.9.4           stringr_1.6.0            
[27] tidyr_1.3.1               dplyr_1.1.4              

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1   farver_2.1.2       S7_0.2.1           fastmap_1.2.0     
 [5] tweenr_2.0.3       janeaustenr_1.0.0  digest_0.6.39      timechange_0.3.0  
 [9] lifecycle_1.0.4    tokenizers_0.3.0   magrittr_2.0.4     compiler_4.5.2    
[13] rlang_1.1.6        tools_4.5.2        yaml_2.3.11        labeling_0.4.3    
[17] prettyunits_1.2.0  stopwords_2.3      graphlayouts_1.2.2 htmlwidgets_1.6.4 
[21] xml2_1.5.1         withr_3.0.2        purrr_1.2.0        grid_4.5.2        
[25] polyclip_1.10-7    MASS_7.3-65        dichromat_2.0-0.1  cli_3.6.5         
[29] rmarkdown_2.30     crayon_1.5.3       generics_0.1.4     rstudioapi_0.17.1 
[33] cachem_1.1.0       ggforce_0.5.0      splines_4.5.2      vctrs_0.6.5       
[37] Matrix_1.7-4       jsonlite_2.0.0     hms_1.1.4          systemfonts_1.3.1 
[41] glue_1.8.0         stringi_1.8.7      gtable_0.3.6       pillar_1.11.1     
[45] htmltools_0.5.8.1  R6_2.6.1           textshaping_1.0.4  evaluate_1.0.5    
[49] lattice_0.22-7     backports_1.5.0    SnowballC_0.7.1    memoise_2.0.1     
[53] Rcpp_1.1.0         zip_2.3.3          fastmatch_1.1-6    svglite_2.2.2     
[57] nsyllable_1.0.1    gridExtra_2.3      nlme_3.1-168       mgcv_1.9-3        
[61] xfun_0.54          pkgconfig_2.0.3   

6.2 Data Export

Show code
# Export processed data for further analysis
write.xlsx(clean_data, "processed_data.xlsx")
write.xlsx(monthly_stats, "monthly_statistics.xlsx")

References